option 5: To recognize the temporal instances of different sounds in an audio signal